camera angle
A Deep Learning-Based CCTV System for Automatic Smoking Detection in Fire Exit Zones
Sadat, Sami, Hossain, Mohammad Irtiza, Sifat, Junaid Ahmed, Rafi, Suhail Haque, Alvi, Md. Waseq Alauddin, Rhaman, Md. Khalilur
A deep learning real-time smoking detection system for CCTV surveillance of fire exit areas is proposed in this research due to its critical safety requirements. The dataset contained 8,124 images which came from 20 different scenarios along with images from 2,708 raw samples demonstrating low-light areas. We implemented an evaluation of three advanced object detection models which included YOLOv8 and YOLOv11 and YOLOv12 followed by development of our custom model that derived its design from YOLOv8 through added structures for facing demanding surveillance contexts. The proposed model outperformed other evaluated models by reaching recall of 78.90% and mAP@50 of 83.70% to deliver optimal object identification and detection results across different environments. A performance evaluation for inference involved analysing multiple edge devices through mul-tithreaded operations. The Jetson Xavier NX processed information at the fastest real-time rate of 52-97 ms which established its suitability for time-sensitive operations. The study establishes the proposed system delivers a fair and adjustable platform to monitor public safety processes while enabling automatic regulatory compliance checks.
Visio-Verbal Teleimpedance Interface: Enabling Semi-Autonomous Control of Physical Interaction via Eye Tracking and Speech
Jekel, Henk H. A., Rosales, Alejandro Díaz, Peternel, Luka
The paper presents a visio-verbal teleimpedance interface for commanding 3D stiffness ellipsoids to the remote robot with a combination of the operator's gaze and verbal interaction. The gaze is detected by an eye-tracker, allowing the system to understand the context in terms of what the operator is currently looking at in the scene. Along with verbal interaction, a Visual Language Model (VLM) processes this information, enabling the operator to communicate their intended action or provide corrections. Based on these inputs, the interface can then generate appropriate stiffness matrices for different physical interaction actions. To validate the proposed visio-verbal teleimpedance interface, we conducted a series of experiments on a setup including a Force Dimension Sigma.7 haptic device to control the motion of the remote Kuka LBR iiwa robotic arm. The human operator's gaze is tracked by Tobii Pro Glasses 2, while human verbal commands are processed by a VLM using GPT-4o. The first experiment explored the optimal prompt configuration for the interface. The second and third experiments demonstrated different functionalities of the interface on a slide-in-the-groove task.
- Europe > Netherlands > South Holland > Delft (0.04)
- Europe > Switzerland (0.04)
Terrain-Aware Adaptation for Two-Dimensional UAV Path Planners
Karakontis, Kostas, Petsanis, Thanos, Kapoutsis, Athanasios Ch., Kapoutsis, Pavlos Ch., Kosmatopoulos, Elias B.
-- Multi-UA V Coverage Path Planning (mCPP) algorithms in popular commercial software typically treat a Region of Interest (RoI) only as a 2D plane, ignoring important 3D structure characteristics. This leads to incomplete 3D reconstructions, especially around occluded or vertical surfaces. In this paper, we propose a modular algorithm that can extend commercial two-dimensional path planners to facilitate terrain-aware planning by adjusting altitude and camera orientations. T o demonstrate it, we extend the well-known DARP (Divide Areas for Optimal Multi-Robot Coverage Path Planning) algorithm and produce DARP-3D. Compared to baseline, our approach consistently captures improved 3D reconstructions, particularly in areas with significant vertical features. An open-source implementation of the algorithm is available here: https://github.com/konskara/T
- Asia > Middle East > UAE > Dubai Emirate > Dubai (0.05)
- Europe > Greece > Central Macedonia > Thessaloniki (0.04)
- Europe > Greece > Attica > Athens (0.04)
- Research Report (0.50)
- Workflow (0.46)
- Transportation > Air (0.46)
- Government (0.46)
DJI Air 3S review: LiDAR and improved image quality make for a nearly faultless drone
DJI just announced the dual-camera Air 3S drone and there's some all-new cutting-edge tech hiding in the nose. A LiDAR sensor is there to provide extra crash protection at night, a time that's often dangerous for drones. The Air 3S also has a new main camera with a larger sensor better suited for capturing video in low-light. And it now comes with the company's ActiveTrack 360, which it first introduced in the Mini 4 Pro, allowing the device to zoom all around your subject while tracking and filming them. There are a bunch of other little improvements, from storage to the new panoramic photo mode, all at the same 1,099 price as the Air 3 was at launch.
- North America > United States (0.05)
- Europe (0.05)
Optimizing Parking Space Classification: Distilling Ensembles into Lightweight Classifiers
Alves, Paulo Luza, Hochuli, André, de Oliveira, Luiz Eduardo, de Almeida, Paulo Lisboa
When deploying large-scale machine learning models for smart city applications, such as image-based parking lot monitoring, data often must be sent to a central server to perform classification tasks. This is challenging for the city's infrastructure, where image-based applications require transmitting large volumes of data, necessitating complex network and hardware infrastructures to process the data. To address this issue in image-based parking space classification, we propose creating a robust ensemble of classifiers to serve as Teacher models. These Teacher models are distilled into lightweight and specialized Student models that can be deployed directly on edge devices. The knowledge is distilled to the Student models through pseudo-labeled samples generated by the Teacher model, which are utilized to fine-tune the Student models on the target scenario. Our results show that the Student models, with 26 times fewer parameters than the Teacher models, achieved an average accuracy of 96.6% on the target test datasets, surpassing the Teacher models, which attained an average accuracy of 95.3%.
- Education (1.00)
- Transportation > Infrastructure & Services (0.87)
- Transportation > Ground > Road (0.87)
Online Distribution Shift Detection via Recency Prediction
Luo, Rachel, Sinha, Rohan, Sun, Yixiao, Hindy, Ali, Zhao, Shengjia, Savarese, Silvio, Schmerling, Edward, Pavone, Marco
When deploying modern machine learning-enabled robotic systems in high-stakes applications, detecting distribution shift is critical. However, most existing methods for detecting distribution shift are not well-suited to robotics settings, where data often arrives in a streaming fashion and may be very high-dimensional. In this work, we present an online method for detecting distribution shift with guarantees on the false positive rate - i.e., when there is no distribution shift, our system is very unlikely (with probability $< \epsilon$) to falsely issue an alert; any alerts that are issued should therefore be heeded. Our method is specifically designed for efficient detection even with high dimensional data, and it empirically achieves up to 11x faster detection on realistic robotics settings compared to prior work while maintaining a low false negative rate in practice (whenever there is a distribution shift in our experiments, our method indeed emits an alert). We demonstrate our approach in both simulation and hardware for a visual servoing task, and show that our method indeed issues an alert before a failure occurs.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
A View Independent Classification Framework for Yoga Postures
Chasmai, Mustafa, Das, Nirjhar, Bhardwaj, Aman, Garg, Rahul
Yoga is a globally acclaimed and widely recommended practice for a healthy living. Maintaining correct posture while performing a Yogasana is of utmost importance. In this work, we employ transfer learning from Human Pose Estimation models for extracting 136 key-points spread all over the body to train a Random Forest classifier which is used for estimation of the Yogasanas. The results are evaluated on an in-house collected extensive yoga video database of 51 subjects recorded from 4 different camera angles. We propose a 3 step scheme for evaluating the generalizability of a Yoga classifier by testing it on 1) unseen frames, 2) unseen subjects, and 3) unseen camera angles. We argue that for most of the applications, validation accuracies on unseen subjects and unseen camera angles would be most important. We empirically analyze over three public datasets, the advantage of transfer learning and the possibilities of target leakage. We further demonstrate that the classification accuracies critically depend on the cross validation method employed and can often be misleading. To promote further research, we have made key-points dataset and code publicly available.
DeepDarts: Modeling Keypoints as Objects for Automatic Scorekeeping in Darts using a Single Camera
McNally, William, Walters, Pascale, Vats, Kanav, Wong, Alexander, McPhee, John
Existing multi-camera solutions for automatic scorekeeping in steel-tip darts are very expensive and thus inaccessible to most players. Motivated to develop a more accessible low-cost solution, we present a new approach to keypoint detection and apply it to predict dart scores from a single image taken from any camera angle. This problem involves detecting multiple keypoints that may be of the same class and positioned in close proximity to one another. The widely adopted framework for regressing keypoints using heatmaps is not well-suited for this task. To address this issue, we instead propose to model keypoints as objects. We develop a deep convolutional neural network around this idea and use it to predict dart locations and dartboard calibration points within an overall pipeline for automatic dart scoring, which we call DeepDarts. Additionally, we propose several task-specific data augmentation strategies to improve the generalization of our method. As a proof of concept, two datasets comprising 16k images originating from two different dartboard setups were manually collected and annotated to evaluate the system. In the primary dataset containing 15k images captured from a face-on view of the dartboard using a smartphone, DeepDarts predicted the total score correctly in 94.7% of the test images. In a second more challenging dataset containing limited training data (830 images) and various camera angles, we utilize transfer learning and extensive data augmentation to achieve a test accuracy of 84.0%. Because DeepDarts relies only on single images, it has the potential to be deployed on edge devices, giving anyone with a smartphone access to an automatic dart scoring system for steel-tip darts. The code and datasets are available.
Learning Transferable 3D Adversarial Cloaks for Deep Trained Detectors
Maesumi, Arman, Zhu, Mingkang, Wang, Yi, Chen, Tianlong, Wang, Zhangyang, Bajaj, Chandrajit
This paper presents a novel patch-based adversarial attack pipeline that trains adversarial patches on 3D human meshes. We sample triangular faces on a reference human mesh, and create an adversarial texture atlas over those faces. The adversarial texture is transferred to human meshes in various poses, which are rendered onto a collection of real-world background images. Contrary to the traditional patch-based adversarial attacks, where prior work attempts to fool trained object detectors using appended adversarial patches, this new form of attack is mapped into the 3D object world and back-propagated to the texture atlas through differentiable rendering. As such, the adversarial patch is trained under deformation consistent with real-world materials. In addition, and unlike existing adversarial patches, our new 3D adversarial patch is shown to fool state-of-the-art deep object detectors robustly under varying views, potentially leading to an attacking scheme that is persistently strong in the physical world.
- Information Technology > Security & Privacy (0.89)
- Government > Military (0.89)
Facebook's AI extracts playable characters from real-world videos
Using these and combined pose data, Pose2Frame separates between character-dependent changes in the scene like shadows, held items, and reflections and those that are character-independent, and returns a pair of outputs that are linearly blended with any desired background. To train the AI system, the researchers sourced three videos, each between five and eight minutes long, of a tennis player outdoors, a person swinging a sword indoors, and a person walking. Compared with a neural network model fed three-minute video of a dancer, they report that their approach managed to successfully field dynamic elements, such as other people and differences in camera angle, in addition to variations in character clothing and camera angle. "Each network addresses a computational problem not previously fully met, together paving the way for the generation of video games with realistic graphics," they wrote. "In addition, controllable characters extracted from YouTube-like videos can find their place in the virtual worlds and augmented realities." Facebook isn't the only company investigating AI systems that might aid in game design. Startup Promethean AI employs machine learning to help human artists create art for video games, and Nvidia researchers recently demonstrated a generative model that can create virtual environments using video snippets. Machine learning has also been used to rescue old game textures in retro titles like Final Fantasy VII and The Legend of Zelda: Twilight Princess, and to generate thousands of levels in games like Doom from scratch.
- Leisure & Entertainment > Games > Computer Games (1.00)
- Leisure & Entertainment > Sports > Tennis (0.93)
- Information Technology > Services (0.87)